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Abstract 

Formalizing  linguists’  intuitions  of  language  change  as  a  dynamical  system,  we  quantify  the  time  course 
of  language  change  including  sudden  vs.  gradual  changes  in  languages.  We  apply  the  computer  model  to 
the  historical  loss  of  Verb  Second  from  Old  French  to  modern  French,  showing  that  otherwise  adequate 
grammatical  theories  can  fail  our  new  evolutionary  criterion. 
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1  Introduction 


Language  scientists  have  long  been  occupied  with  de¬ 
scribing  phonological,  syntactic,  and  semantic  change, 
often  appealing  to  an  analogy  between  language  change 
and  evolution,  but  rarely  going  beyond  this.  For  in¬ 
stance,  Lightfoot  (1991,  chapter  7,  pp.  163-65ff.)  talks 
about  language  change  in  this  way:  “Some  general  prop¬ 
erties  of  language  change  are  shared  by  other  dynamic 
systems  in  the  natural  world^.  Here  we  formalize  these 
intuitions,  to  the  best  of  our  knowledge  for  Rrst  time, 
as  a  concrete,  computational,  dynamical  systems  model, 
investigating  its  consequences.  Specifically,  we  show 
that  a  computational  population  language  change  model 
emerges  as  a  natural  consequence  of  individual  language 
learnability  Our  computational  model  establishes  the  fol¬ 
lowing: 

•  Learnability  is  a  well-known  criterion  for  the  ad¬ 
equacy  of  grammatical  theories.  Our  model  pro¬ 
vides  an  evolutionary  criterion:  By  comparing  the 
trajectories  of  dynamical  linguistic  systems  to  his¬ 
torically  observed  trajectories,  one  can  determine 
the  adequacy  of  linguistic  theories  or  learning  al¬ 
gorithms. 

•  We  derive  explicit  dynamical  systems  correspond¬ 
ing  to  parametrized  linguistic  theories  (e.qg.  Head 
First/Final  parameter  in  HPSG  or  GB  grammars) 
and  memoryless  language  learning  algorithms  (e.g. 
gradient  ascent  in  parameter  space). 

•  We  illustrate  the  use  of  dynamical  systems  as  a 
research  tool  by  considering  the  loss  of  Verb  Sec¬ 
ond  position  in  Old  French  as  compared  to  Mod¬ 
ern  French.  We  demonstrate  by  computer  model¬ 
ing  that  one  grammatical  parameterization  in  the 
literature  does  not  seem  to  permit  this  historical 
change,  while  another  does.  We  can  more  accu¬ 
rately  model  the  time  course  of  language  change.  In 
particular,  in  contrast  to  Kroch  (1989)  and  others, 
who  mimic  population  biology  models  by  imposing 
an  S-shaped  logistic  change  by  assumption,  we  ex¬ 
plain  the  time  course  of  language  change,  and  show 
that  it  need  not  be  S-shaped.  Rather,  language- 
change  envelopes  are  derivable  from  more  funda¬ 
mental  properties  of  dynamical  systems;  sometimes 
they  are  S-shaped,  but  they  can  also  be  nonmono¬ 
tonic. 

•  We  examine  by  simulation  and  traditional  phase- 
space  plots  the  form  and  stability  of  possible  “di¬ 
achronic  envelopes”  given  varying  conditions  of  al¬ 
ternative  language  distributions,  language  acqui¬ 
sition  algorithms,  parameterizations,  input  noise, 
and  sentence  distributions  systems. 

2  The  Acquisition-Based  Model  of 
Language  Change 

We  first  show  how  a  combination  of  a  grammatical  the¬ 
ory  and  a  learning  paradigm  leads  directly  to  a  formal 

^One  notable  exception  is  Kroch,  1989,  whose  acconnt  we 
explore  below. 


Generations 

Figure  1:  Time  evolution  of  grammars  using  a  greedy 
learning  algorithm.  The  x-axis  is  generation  time,  e.g., 
units  of  20-30  years.  The  y-axis  is  the  percentage  of 
the  population  speaking  the  languages  as  indicated  on 
the  curves,  e.g,  S(ubject)  V(erb)  O(bject),  with  no  Verb 
Second=  SVO— V2. 


dynamical  systems  model  of  language  change. 

First,  informally,  consider  an  adult  population  speak¬ 
ing  a  particular  language^.  Individual  children  attempt 
to  attain  their  caretaker  target  grammar.  After  a  fi¬ 
nite  number  of  examples,  some  are  successful,  but  oth¬ 
ers  may  misconverge.  The  next  generation  will  therefore 
no  longer  be  linguistically  homogeneous.  The  third  gen¬ 
eration  of  children  will  hear  sentences  produced  by  the 
second — a  different  distribution — and  they,  in  turn,  will 
attain  a  different  set  of  grammars.  Over  generations,  the 
linguistic  composition  evolves  as  a  dynamical  system.  In 
the  remainder  of  this  paper  we  formalize  this  intuition, 
obtaining  detailed  figures  like  the  one  in  1,  showing  the 
evolution  of  language  types  over  successive  generations 
within  a  single  community.  We  return  to  the  details 
later,  but  let  us  first  formalize  our  intuitions. 

Grammatical  theory,  Learning  Algorithm, 
Sentence  Distribntions 

1.  Denote  by  tj,  a  family  of  possible  (target)  gram¬ 
mars.  Each  grammar  g  ^  Q  defines  a  language  L{g)  C 
E*  over  some  alphabet  E  in  the  usual  way. 

2.  Denote  by  P,  the  distribution  with  which  sentences 
of  E*  are  presented  to  the  individual  learner  (child). 
More  specifically,  let  Pi  be  the  distribution  with  which 
sentences  of  the  ith  grammar  gi  G  Q  are  presented  if 
there  is  a  speaker  of  gi  in  the  adult  population.  Thus, 
if  the  adult  population  is  linguistically  homogeneous 
(with  grammar  ^i)  then  P  =  Pi.  If  the  adult  popula¬ 
tion  speaks  50  percent  L(gi)  and  50  percent  L(g2)  then 
P=Lp,  +  Lp,, 

3.  Denote  by  A  the  learning  algorithm  that  children 
use  to  hypothesize  a  grammar  on  the  basis  of  input  data. 

^In  our  framework,  this  implies  that  the  adult  members 
of  this  population  have  internalized  the  same  grammar. 


If  dn  is  a  presentation  sequence  of  n  randomly  drawn 
examples,  then  learnability  (Gold,  1967)  requires  (for 
every  target  grammar  gt), 

Proh[A(d„)  =  gt]  — ^n^oo  1 

We  now  define  a  dynamical  system  by  providing  its 
two  necessary  components: 

A  State  Space  (i5):  a  set  of  system  states.  Here,  the 
state  space  is  the  space  of  possible  linguistic  composi¬ 
tions  of  the  population.  Each  state  is  described  by  a 
distribution  Ppop  on  Q  describing  the  language  spoken 
by  the  population.^ 

An  Update  Rule:  how  the  system  states  change  from 
one  time  step  to  the  next.  Typically,  this  involves  spec¬ 
ifying  a  function,  /,  that  maps  Sj  G  S'  to 

In  our  case  the  update  rule  can  be  derived  directly 
from  learning  algorithm  A  because  learning  can  change 
the  distribution  of  languages  spoken  from  one  genera¬ 
tion  to  the  next.  For  example,  given  Ppopg,  we  see 
that  any  any  w  G  E*  is  presented  with  probability 

-P(^)  =  Xli  Pi{‘^)Ppop,t{i)- 

The  learning  algorithm  A  uses  the  linguistic  data  (n 
examples,  indicated  by  dn)  and  conjectures  hypothe¬ 
ses  {A{dn)  G  0).  One  can,  in  principle,  compute  this 
probability®  with  which  the  learner  will  develop  an  arbi¬ 
trary  hypothesis,  hi,  after  n  examples: 

Finite  Sample:  Proh[A(dn)  =  hi]  =  Pn(hi)  (1) 

Learnability  requires  Pn(gt)  to  go  to  1,  for  the  unique 
target  grammar,  gt,  if  such  a  grammar  exists.  In  gen¬ 
eral,  there  is  no  unique  target  grammar  since  we  have 
nonhomogeneous  linguistic  populations.  However,  the 
following  limiting  behavior  can  still  exist: 

Limiting  Sample:  lim  Proh[A{dn)  =  hi]  =  pi  (2) 

n— ^oo 

Thus,  with  probability  p„(/ij),®  an  arbitrary  child  will 
have  internalized  grammar  hi.  Thus,  in  the  next  genera¬ 
tion,  a  proportion  Pn(hi)  of  the  population  has  grammar 
hi,  i.e.,  the  linguistic  composition  of  the  next  generation 
is  given  by  Ppop,t+i(hi)  =  pi(oi  Pn(hi)).  In  this  fashion, 
we  have  an  update  rule. 


^  As  usual,  one  needs  to  be  able  to  define  a  cr-algebra  on  the 
space  of  grammars,  and  so  on.  This  is  unproblematic  for  the 
cases  considered  in  this  paper  because  the  set  of  grammars 
is  hnite. 

^In  general,  this  mapping  could  be  fairly  complicated.  For 
example,  it  could  depend  on  previous  states,  future  states, 
and  so  forth;  for  reasons  of  space  we  do  not  consider  all  pos¬ 
sibilities  here.  For  reference,  see  Strogatz  (1993). 

°The  hnite  sample  situation  is  always  well  dehned;  see 
Niyogi,  1994. 

®Or  pt,  depending  upon  whether  one  wishes  to  carry  out  a 
hnite  sample,  or  a  limiting  sample  analysis  for  learning  within 
one  generation. 


Generality  of  the  approach.  Note  that  such  a  dy¬ 
namical  system  exists  for  every  choice  of  A,  G,  and  Pi 
(relative  to  the  constraints  mentioned  earlier).  In  short 
then, 

(G,A,  {Fj})  — ^  V(  dynamical  system) 
Importantly,  this  formulation  does  not  assume  any  par¬ 
ticular  linguistic  theory,  learning  algorithm,  or  distribu¬ 
tion  over  sentences. 

3  Language  Change  in  Parametric 
Systems 

We  next  instantiate  our  abstract  system  by  modeling 
some  specihc  cases.  Suppose  we  have  a  “parameter¬ 
ized”  grammatical  theory,  such  as  HPSG  or  GB,  with 
n  boolean-valued  parameters  and  a  space  Q  with  2"  dif¬ 
ferent  languages  (in  this  case,  equivalently,  grammars). 
Further  take  the  assumptions  of  Berwick  and  Niyogi 
(1994),  regarding  sentence  distributions  and  learning:  Pi 
is  uniform  on  unembedded  sentences  generated  by  gi  and 
A  is  single  step,  gradient  ascent.  To  derive  the  relevant 
update  rule  we  need  the  following  theorem  and  corollar¬ 
ies,  given  here  without  proof  (see  Niyogi,  1994): 

Theorem  1  Any  memoryless  meremental  algorithm 
that  attempts  to  set  the  values  of  the  parameters  on 
the  basis  of  example  sentenees,  ean  be  modeled  exaetly 
by  a  Markov  Chain.  This  Markov  ehain  has  2"  states 
with  .state  eorresponding  to  a  partieular  grammar.  The 
transition  probabilities  depend  upon  the  distribution  P 
with  whieh  sentenees  oeeur,  and  the  learning  algorithm 
A  (whieh  IS  essentially  a  reeursive  funetion  from  data  to 
hypotheses). 

Corollary  1  The  probability  that  the  learner  internal¬ 
izes  hypothesis  hi  after  m  examples  (solution  to  equa¬ 
tion  1 )  IS  given  by, 

Proh[  Learner’s  hypothesis  =  hi  G  after  m  examples] 
=  {^(1,...,1)'T™}H 

Similarly,  making  use  of  limiting  distributions  of 
Markov  chains  (see  Resnick,  1992)  one  can  obtain  the 
following: 

Corollary  2  The  probability  that  the  learner  internal¬ 
izes  hypothesis  hi  “in  the  limit”  (solution  to  equation  2) 
IS  given  by 

Proh[  Learner’s  hypothesis  =  hi  “in  the  limit”] 

=  (l,...,iy(L -T  +  ONE)-^ 
where  ONE  is  a  -P  x  -P  matrix  with  all  ones. 

This  yields  our  required  dynamical  system  for 
parameter-based  theories: 

1.  Let  be  the  initial  population  mix.  Assume  Pfs 
as  above.  Compute  P  according  from  and  Pfs,. 

2.  Compute  T  (transition  matrix)  according  to  the 
theorem. 

3.  Use  the  corollaries  to  the  theorems  to  obtain  the 
update  rule,  to  get  the  population  mix  112. 

4.  Repeat  for  the  next  generation. 


4  Example  1:  A  Three  Parameter 
System 

Let  us  consider  a  specific  example  to  illustrate  the  deriva¬ 
tion  of  the  previous  section:  the  3-parameter  syntactic 
subsystem  describe  in  Gibson  and  Wexler  (1994)  and 
Niyogi  and  Berwick  (1994).  Specihcally,  posit  3  Boolean 
parameters,  Speciher  Rrst/Rnal;  Head  Rrst/Rnal;  Verb 
second  allowed  or  not,  leading  to  8  possible  gram¬ 
mars/languages  (English  and  French,  SVO— Verb  sec¬ 
ond;  Bengali  and  Hindi,  SOV— Verb  second;  German  and 
Dutch,  SOV-fVerb  second;  and  so  forth).  The  learning 
algorithm  is  single-step  gradient  ascent.  For  the  mo¬ 
ment,  take  Pi  to  be  a  uniform  distribution  on  unem¬ 
bedded  sentences  in  the  language.  Let  us  consider  some 
results  we  obtain  by  simulating  the  resulting  dynamical 
systems  by  computer.  Our  key  results  are  these: 

1.  All  4- Verb  second  populations  remain  stable  over 
time.  Nonverb  second  populations  tend  to  gam  Verb 
second  over  time  (e.g.,  English-type  languages  change  to 
a  more  German  type)  contrary  to  historically  observed 
phenomena  (loss  of  Verb  second  in  both  French  and  En¬ 
glish)  and  linguistic  intuition  (Lightfoot,  1991).  This 
evolutionary  behavior  suggests  that  either  the  grammat¬ 
ical  theory  or  the  learning  algorithm  are  incorrect,  or 
both. 

2.  Rates  of  change  can  vary  from  gradual  S-shaped 
curves  (Rg.  2)  to  more  sudden  changes  (Rg.  3). 

3.  Diachronic  envelopes  are  often  logistic,  but  not  al¬ 
ways.  Note  that  in  some  alternative  models  of  language 
change,  the  logistic  shape  has  sometimes  been  assumed 
as  a  starting  point,  see,  e.g.,  Kroch  (1982,  1989).  How¬ 
ever,  Kroch  concedes  that  “unlike  in  the  population  bi¬ 
ology  case,  no  mechanism  of  change  has  been  proposed 
from  which  the  logistic  form  can  be  deduced” .  On  the 
contrary,  we  propose  that  the  logistic  form  is  derivative, 
in  that  it  sometimes  arises  from  more  fundamental  as¬ 
sumptions  about  the  grammatical  theory,  acquisition  al¬ 
gorithm,  and  sentence  distributions.  Sometimes  a  logis¬ 
tic  form  is  not  even  observed,  as  in  Rg.  3. 

4.  In  many  cases  the  homogeneous  population  splits 
into  stable  linguistic  groups. 

A  variant  of  the  learning  algorithm  (non-single  step, 
gradient  ascent)  yields  Rgure  1  shown  at  the  beginning  of 
this  paper.  Here  again,  populations  tend  to  gain  Verb- 
Second  over  time. 

Next,  see  Rg.  4  for  the  effect  of  maturation  time  on 
evolutionary  trajectories. 

Finally,  so  we  have  assumed  that  the  Pi’s  were  uni¬ 
form.  Fig.  5  shows  the  evolution  of  the  L2  (V  O  S  -|-V2) 
speakers  as  p  varies. 

4.1  Nonhomogeneous  Populations 

Note  that  instead  of  starting  with  homogeneous  popu¬ 
lations,  one  could  consider  any  nonhomogeneous  initial 
condition,  e.g.  a  mixture  of  English  and  German  speak¬ 
ers.  Each  such  initial  condition  results  in  a  grammatical 
trajectory  as  shown  in  Rg.  6.  One  typically  characterizes 
dynamical  systems  by  their  phase-space  plots.  These 
contain  all  the  trajectories  corresponding  to  different  ini¬ 
tial  conditions,  exhibited  in  Rg.  7. 


Figure  2:  Percentage  of  the  population  speaking  lan¬ 
guages  of  the  basic  forms  V (erb)  O(bject)  S(ubject)  with 
and  without  Verb  second.  The  evolution  has  been  shown 
upto  20  generations,  as  the  proportions  do  not  vary  sig- 
niRcantly  thereafter.  Notice  the  “S”  shaped  nature  of 
the  curve  (Kroch,  1989,  imposes  such  a  shape  by  Rat  us¬ 
ing  models  from  population  biology,  while  we  derive  this 
form  as  an  emergent  property  of  our  dynamical  model, 
given  varying  starting  conditions).  Also  notice  the  re¬ 
gion  of  maximum  change  as  the  Verb  second  parameter 
is  slowly  set  by  increasing  proportion  of  the  population, 
with  no  external  inRuence. 


Finally,  the  following  theorem  characterizes  stable 
nonhomogeneous  populations: 

Theorem  2  (Finite  Case)  A  fixed  point  (stable  point) 
of  the  grammatieal  dynamieal  system  (obtained  by  a 
memoryless  learner  operating  on  the  3  parameter  spaee 
with  k  examples  to  ehoose  its  mature  hypothesis)  is  a 
solution  of  the  following  eguation: 

8 

n'  =  (7ri,...,7r8)  =  (l,...,l)'(^7r,T,)" 

i  =  l 

If  the  learner  were  given  infinite  time  to  choose  its  hy¬ 
pothesis,  then  the  fixed  point  is  given  by 

8 

H'  =  (tti,  .  .  .  ,  TTg)  =  (1,  .  .  .  ,  !)'(/  -  ^  TTiPi  +  ONE)-^ 

8  =  1 

where  ONE  is  the  8x8  matrix  with  all  its  entries  egual 
to  1. 

Proof  (Sketch):  Both  equations  are  obtained  simply 
by  setting  n(t  -|-  1)  =  n(t).  I 

Remark:  Strogatz  (1993)  suggests  that  higher  dimen¬ 
sional  nonlinear  mappings  are  likely  to  be  chaotic.  Since 
our  systems  fall  into  such  a  class,  this  possible  chaotic 
behavior  needs  to  be  investigated  further;  we  leave  this 
for  future  publications. 

5  The  Case  of  Modern  French 

We  brieRy  consider  a  different  parametric  system  (stud¬ 
ied  by  Glark  and  Roberts,  1993)  as  a  test  of  our  model’s 


3 


Figure  3:  Percentage  of  the  population  speaking  lan¬ 
guages  S  V  O  —Verb  second  (English)  and  VOS  (-fVerb 
second)  as  it  evolves  over  the  number  of  generations.  No¬ 
tice  the  sudden  shift  over  a  space  of  3-4  generations. 


ability  to  impose  a  diachronic  criterion  on  grammatical 
theories.  The  historical  context  in  which  we  study  this 
is  the  evolution  of  Modern  French  from  Old  French,  in 
particular,  the  loss  of  Verb  second. 

Loss  of  Verb-Second  (from  Clark  and  Roberts,  1993) 

Mod.  *Puis  entendirent-ils  un  coup  de  tonerre. 
then  they  heard  a  clap  of  thunder. 

Old  Lors  oirent  ils  venir  un  escoiz  de  tonoire. 
then  they  heard  come  a  clap  of  thunder 
Recall  that  simulations  in  the  previous  section  in¬ 
dicated  an  (historically  incorrect)  tendency  to  gain 
Verb  second  over  time.  We  now  consider  Clark  and 
Roberts’  (1993)  alternative  5-parameter  grammatical 
theory.  These  parameters  include:  (1)  Null  subjects  or 
not;  (2)  Verb  second;  and  three  other  binary  parameters 
that  we  need  not  detail  here,  yielding  32  possible  lan¬ 
guages  (grammars).  It  has  been  generally  argued  that 
in  the  middle  French  period,  word  forms  like  Adv(erb) 
V(erb)  S(ubject)  decreased  in  frequency,  while  others 
like  Adv  S  V  increased;  eventually  bringing  about  a  loss 
of  Verb  second.  We  can  now  test  this  hypothesis  with 
the  model,  varying  initial  conditions  about  population 
mixtures,  foreign  speakers,  etc. 

Starting  from  just  Old  French,  our  model  shows  that, 
even  without  foreign  intrusion,  eventually  speakers  of 
Old  French  die  out  altogether,  and  within  20  genera¬ 
tions,  15  percent  of  the  speakers  have  lost  Verb  second 
completely;  see  Rg.  8.  However,  note  that  this  is  not 
sufficient  to  attain  Modern  French,  and  the  change  is 
too  slow.  In  order  to  more  closely  duplicate  the  histori¬ 
cally  observed  trajectory,  we  consider  an  initial  condition 
consisting  more  like  that  actually  found:  a  mix  of  Old 
French  and  data  from  Modern  French  (reproducing  the 
intrusion  of  foreign  speakers  and  reproducing  data  simi¬ 
lar  to  that  obtained  from  the  Middle  French  period,  see 
Clark  and  Roberts,  1993  for  justification). 


Generations 


Figure  4:  Time  evolution  of  linguistic  composition  for 
the  situations  where  the  learning  algorithm  used  is  gra¬ 
dient  ascent.  Only  the  percentage  of  people  speaking 
V(erb)  O(bject)  S(ubject)  (-f Verb  second)  is  shown.  The 
initial  population  is  homogeneous  and  speaks  VOS 
(— V2).  The  maturational  time  (number  of  sentences 
the  child  hears  before  internalizing  a  grammar)  is  var¬ 
ied  through  8,  16,  32,  64,  128,  256,  giving  rise  to  six 
curves.  The  curve  with  the  highest  initial  rate  of  change 
corresponds  to  the  situation  where  only  8  examples  were 
allowed  to  the  learner  to  develop  its  mature  hypothe¬ 
sis.  The  initial  rate  of  change  decreases  as  the  matura¬ 
tion  time  N  increases.  The  value  at  which  these  curves 
asymptote  also  seems  to  vary  with  the  maturation  time, 
and  increases  monotonically  with  it. 
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Figure  5:  The  evolution  of  V(erb)  O(bject)  S(ubject) 
+Verb  second  speakers  in  a  community  given  different 
sentence  distributions,  Pi’s.  The  Pi’s  were  perturbed 
(with  parameter  p  denoting  the  extent  of  the  pertur¬ 
bation)  around  a  uniform  distribution.  The  algorithm 
used  was  single-step,  gradient  ascent.  The  initial  pop¬ 
ulation  was  homogeneous,  with  all  members  speaking  a 
V(erb)  O(bject)  S(ubject)  —Verb  second  type  language. 
Curves  for  p  =  0.05,  0.75,  and  0.95  have  been  plotted  as 
solid  lines.  If  we  wanted  the  population  to  completely 
lose  the  Verb  second  parameter,  the  optimal  choice  of  p 
is  0.75  (not  1  as  expected). 
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Figure  7:  Subspace  of  a  Phase-space  plot.  The 
plot  shows  the  number  of  speakers  of  V(erb)  O(bject) 
S(ubject)  (—Verb  second  and  -fVerb  second)  as  t  varies. 
The  learning  algorithm  was  single  step,  gradient  ascent. 
The  different  curves  correspond  to  grammatical  trajec¬ 
tories  for  different  initial  conditions. 


Figure  6:  Subspace  of  a  Phase-space  plot.  The 
plot  shows  the  number  of  speakers  of  V(erb)  O(bject) 
S(ubject)  (—Verb  second  and  -fVerb  second)  as  t  varies. 
The  learning  algorithm  was  single  step,  gradient  ascent. 
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Figure  8:  Evolution  of  speakers  of  different  languages 
in  a  population  starting  off  with  speakers  only  of  Old 
French.  The  “p”  settings  may  be  ignored  here. 
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Given  this  new  initial  condition,  fig.  9  shows  the  pro¬ 
portion  of  speakers  losing  Verb  second  after  one  gener¬ 
ation  as  a  function  of  the  proportion  of  sentences  from 
the  “foreign”  Modern  French  source.  Surprisingly  small 
proportions  of  Modern  French  cause  a  disproportionate 
number  of  speakers  to  lose  Verb  second,  corresponding 
closely  to  the  historically  observed  rapid  change. 

6  Conclusions 

A  learning  theory  (paradigm)  attempts  to  account  for 
how  children  (the  individual  child)  solve  the  problem 
of  language  acquisition.  By  considering  a  population  of 
such  individual  “child”  learners,  we  arrive  at  a  model  of 
emergent,  global,  population  language  behavior.  Conse¬ 
quently,  whenever  a  linguist  proposes  a  new  grammat¬ 
ical  or  learning  theory,  they  are  also  implicitly  propos¬ 
ing  a  particular  theory  of  language  change,  one  whose 
consequences  need  to  be  examined.  In  particular,  we 
saw  the  gain  of  Verb  second  in  the  3-parameter  case 
did  not  match  historically  observed  patterns,  but  the 
5-parameter  system  did.  In  this  way  the  dynamical  sys¬ 
tems  model  supports  the  5-parameter  linguistic  system 
to  explain  some  changes  in  French.  We  have  also  greatly 
sharpened  the  informal  notions  of  the  time  course  of  lin¬ 
guistic  change,  and  grammatical  stability.  Such  evolu¬ 
tionary  systems  are,  we  believe,  useful  for  testing  gram¬ 
matical  theories  and  explicitly  modeling  historical  lan¬ 
guage  change. 
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